library(tidyverse)## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(tidyr)
library(dplyr)
library(readr)
library(tableone)
library(sp)
library(sf)## Linking to GEOS 3.8.1, GDAL 3.2.1, PROJ 7.2.1
library(leaflet)
library(geojsonsf)
library(RColorBrewer)
library(tigris)## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
library(tidycensus)##
## Attaching package: 'tidycensus'
## The following object is masked from 'package:tigris':
##
## fips_codes
Due to a rise in Chlamydia infections over the past decade, the importance of routine sexually transmitted infection (STI) testing has been emphasized (CDC, 2021b). Access to healthcare facilities may deter some folks from receiving routine tests. Using publicly available data through the City of Chicago, we will be analyzing the chlamydia rates across the cities neighborhoods, time, and sex to see if there is an association between these factors and chlamydia incidence. I have consulted with the following faculty on my project and discussed:
Dr. Kate Wallis - advised me to look for potential racial disparities as is associated with the neighborhood demographics in the city of Chicago.
Dr. Alison Buttenheim - Discussed what kinds of analysis I could look at that are useful with rates of tesitng. We discussed difficulties there might be in mapping if the geographic data isn’t already in the data set and other options to continue with an interesting project.
Dr. Frances Shofer - advised me in my methodology that would be appropriate for the datasets I have.
The github repository for this project can be found here: https://github.com/CaitMcCrory/BMIN503_Final_Project
Over the past 6 years, STI rates have reached record high levels in the United States (CDC, 2021b). Chlamydia is a sexually transmitted infection (STI) that is oftentimes asymptomatic which is thought to encourage further spread. According to the Centers for Disease Control and Prevention (CDC), 90% of males and 70-95% of females are asymptomatic. Chlamydia is known to cause some harmful side effects including pelvic inflammatory disease and which sometimes leads to infertility in people with uteruses (CDC, 2021a). Fortunately, chlamydia can be treated with antibiotics when it is detected. Detection of chlamydia requires STI testing. In the United States, sexually active females under the age of 25, pregnant women, and men who have sex with men (MSM) are the groups that are recommended to be screened annually (CDC, 2021c). In this project we aim to give the context of chlamydia in Chicago to inform future interventions and evaluate if the city follows national trends.. By assessing the ways that sex, neighborhood, and time affect the chlamydia rates, we can analyze current positivity and testing trends in the city. Improving rates of STI testing involves health communication, access to health care, and risk communication. This project is of interdisciplinary nature seeing that there is data analytics, healthcare access, behavioral economics, and provider interactions components to it.
The data used here are from the city of Chicago open access health data. The data includes cases, incidence + 95% confidence intervals, community area number, and community area name. These data are available for both males and females from 2000-2014 in separate but mirrored data frames (Weaver, 2021a and Weaver, 2021b). An additionally dataset was brought in, also from the city of Chicago, to provide geospatial data needed to map chlamydia rates (City of Chicago, 2018). For the purposes of this project, the data will be cleaned and analyzed to describe trends in incidence by sex and by neighborhood using R-studio (R Core Team, 2021):
First we are calling in data directly from the city of Chicago website.
#raw data for males will be male.chlm and raw data for females will be female.chlm
male.chlm.url <- url("https://data.cityofchicago.org/api/views/35yf-6dy3/rows.csv?accessType=DOWNLOAD")
male.chlm <- read_csv(male.chlm.url)## Rows: 79 Columns: 63
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Community Area Name, WARNING
## dbl (61): Community Area, CASES 2000 Male 15-44, Incidence Rate 2000, Incidence Rate 2000 Lower CI, Incidence Rate 2000 Upper CI, CASES 2001 Male 15-44, Incidence Rate 2001, Incidence Rate 2001 Lower CI, Incidence Rate 2001 Upper CI, CASES 2002 Male 15-44, Incidence Rate 2002, Incidence Rate 2002 Lower CI, Incidence Rate 2002 Upper CI, CASES 2003 Male 15-44, Incidence Rate 2003, Incidence Rate...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
female.chlm.url <- url("https://data.cityofchicago.org/api/views/bz6k-73ti/rows.csv?accessType=DOWNLOAD")
female.chlm <- read_csv(female.chlm.url)## Rows: 79 Columns: 63
## ── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Community Area Name, WARNING
## dbl (61): Community Area, Cases 2000 Female 15-44, Incidence Rate 2000, Incidence Rate 2000 Lower CI, Incidence Rate 2000 Upper CI, Cases 2001 Female 15-44, Incidence Rate 2001, Incidence Rate 2001 Lower CI, Incidence Rate 2001 Upper CI, Cases 2002 Female 15-44, Incidence Rate 2002, Incidence Rate 2002 Lower CI, Incidence Rate 2002 Upper CI, Cases 2003 Female 15-44, Incidence Rate 2003, Incide...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#this is the shape data from the City of Chicago brought in as a GeoJSON and will be transformed into sf dataframe later in part 2.
geoJSON <- rgdal::readOGR("https://data.cityofchicago.org/api/geospatial/bbvz-uum9?method=export&format=GeoJSON")## OGR data source with driver: GeoJSON
## Source: "https://data.cityofchicago.org/api/geospatial/bbvz-uum9?method=export&format=GeoJSON", layer: "OGRGeoJSON"
## with 98 features
## It has 4 fields
Second we clean the data and isolate variables into new data frames with the neccessary variables for further analysis. We exclude the 95% confidence intervals here and are exclusively reviewing incidence rates with their point estimate.
#This data cleaning r-chunk will set up the dataframes for part 1 of the data analysis
#removing confidence intervals and cases
#creating a data frame that is just chicago aggregate data for incidence rates + removing community area and commuity area name so we can make a bar plot (note to self incase I forget, chicago's community area is 88 and is the entire city of chicago as the community area name)
chi.female.incid <- female.chlm %>%
dplyr::filter(`Community Area Name` == "Chicago") %>% #the filter function pulls only rows that are labeled "chicago"
dplyr::select(`Incidence Rate 2000`,
`Incidence Rate 2001`,
`Incidence Rate 2002`,
`Incidence Rate 2003`,
`Incidence Rate 2004`,
`Incidence Rate 2005`,
`Incidence Rate 2006`,
`Incidence Rate 2007`,
`Incidence Rate 2008`,
`Incidence Rate 2009`,
`Incidence Rate 2010`,
`Incidence Rate 2011`,
`Incidence Rate 2012`,
`Incidence Rate 2013`,
`Incidence Rate 2014`) %>%
dplyr::rename(`2000` = `Incidence Rate 2000`,
`2001` = `Incidence Rate 2001`,
`2002` = `Incidence Rate 2002`,
`2003` = `Incidence Rate 2003`,
`2004` = `Incidence Rate 2004`,
`2005` = `Incidence Rate 2005`,
`2006` = `Incidence Rate 2006`,
`2007` = `Incidence Rate 2007`,
`2008` = `Incidence Rate 2008`,
`2009` = `Incidence Rate 2009`,
`2010` = `Incidence Rate 2010`,
`2011` = `Incidence Rate 2011`,
`2012` = `Incidence Rate 2012`,
`2013` = `Incidence Rate 2013`,
`2014` = `Incidence Rate 2014`)
#repeated for male sheet
chi.male.incid <- male.chlm %>%
dplyr::filter(`Community Area Name` == "Chicago") %>%
dplyr::select(`Incidence Rate 2000`,
`Incidence Rate 2001`,
`Incidence Rate 2002`,
`Incidence Rate 2003`,
`Incidence Rate 2004`,
`Incidence Rate 2005`,
`Incidence Rate 2006`,
`Incidence Rate 2007`,
`Incidence Rate 2008`,
`Incidence Rate 2009`,
`Incidence Rate 2010`,
`Incidence Rate 2011`,
`Incidence Rate 2012`,
`Incidence Rate 2013`,
`Incidence Rate 2014`) %>%
dplyr::rename(`2000` = `Incidence Rate 2000`,
`2001` = `Incidence Rate 2001`,
`2002` = `Incidence Rate 2002`,
`2003` = `Incidence Rate 2003`,
`2004` = `Incidence Rate 2004`,
`2005` = `Incidence Rate 2005`,
`2006` = `Incidence Rate 2006`,
`2007` = `Incidence Rate 2007`,
`2008` = `Incidence Rate 2008`,
`2009` = `Incidence Rate 2009`,
`2010` = `Incidence Rate 2010`,
`2011` = `Incidence Rate 2011`,
`2012` = `Incidence Rate 2012`,
`2013` = `Incidence Rate 2013`,
`2014` = `Incidence Rate 2014`)
#flipping the data frames so they can be plotted and analyzed
chi.female.flipped <- t(chi.female.incid)
chi.female.flipped <- as.data.frame(chi.female.flipped)
#renaming V1 to be meaningful
colnames(chi.female.flipped) <- "Incidence"
chi.female.flipped <- rownames_to_column(chi.female.flipped, var = "year")
#this code mirrors what we did above to the female sheets
chi.male.flipped <- t(chi.male.incid)
chi.male.flipped <- as.data.frame(chi.male.flipped)
colnames(chi.male.flipped) <- "Incidence"
chi.male.flipped <- rownames_to_column(chi.male.flipped, var = "year")
#adding sex as a variable and then binding the dataframes on top of one another
chi.female.incid1 <- chi.female.flipped %>%
dplyr::mutate(sex = "female")
chi.male.incid1 <- chi.male.flipped %>%
dplyr::mutate(sex = "male")
chi.annual.rates.sex <- bind_rows(chi.female.incid1, chi.male.incid1)#this data cleaning r-chunk will set up the dataframes for part 2 and 3 of data analysis
#join the dataframes longways 79 community areas
#creating our new dataframes to do longitudinal analysis of incidence rates
#female dataframe
incid.female <- female.chlm %>%
dplyr::select(`Incidence Rate 2000`, `Incidence Rate 2001`, `Incidence Rate 2002`, `Incidence Rate 2003`, `Incidence Rate 2004`, `Incidence Rate 2005`, `Incidence Rate 2006`, `Incidence Rate 2007`, `Incidence Rate 2008`, `Incidence Rate 2009`, `Incidence Rate 2010`, `Incidence Rate 2011`, `Incidence Rate 2012`, `Incidence Rate 2013`, `Incidence Rate 2014`, `Community Area`, `Community Area Name`) %>%
dplyr::rename(
rate_00 = `Incidence Rate 2000`,
rate_01 = `Incidence Rate 2001`,
rate_02 = `Incidence Rate 2002`,
rate_03 = `Incidence Rate 2003`,
rate_04 = `Incidence Rate 2004`,
rate_05 = `Incidence Rate 2005`,
rate_06 = `Incidence Rate 2006`,
rate_07 = `Incidence Rate 2007`,
rate_08 = `Incidence Rate 2008`,
rate_09 = `Incidence Rate 2009`,
rate_10 = `Incidence Rate 2010`,
rate_11 = `Incidence Rate 2011`,
rate_12 = `Incidence Rate 2012`,
rate_13 = `Incidence Rate 2013`,
rate_14 = `Incidence Rate 2014`) %>%
dplyr::mutate(sex = "female")
#male dataframe
incid.male <- male.chlm %>%
dplyr::select(`Incidence Rate 2000`, `Incidence Rate 2001`, `Incidence Rate 2002`, `Incidence Rate 2003`, `Incidence Rate 2004`, `Incidence Rate 2005`, `Incidence Rate 2006`, `Incidence Rate 2007`, `Incidence Rate 2008`, `Incidence Rate 2009`, `Incidence Rate 2010`, `Incidence Rate 2011`, `Incidence Rate 2012`, `Incidence Rate 2013`, `Incidence Rate 2014`, `Community Area`, `Community Area Name`) %>%
dplyr::rename(
rate_00 = `Incidence Rate 2000`,
rate_01 = `Incidence Rate 2001`,
rate_02 = `Incidence Rate 2002`,
rate_03 = `Incidence Rate 2003`,
rate_04 = `Incidence Rate 2004`,
rate_05 = `Incidence Rate 2005`,
rate_06 = `Incidence Rate 2006`,
rate_07 = `Incidence Rate 2007`,
rate_08 = `Incidence Rate 2008`,
rate_09 = `Incidence Rate 2009`,
rate_10 = `Incidence Rate 2010`,
rate_11 = `Incidence Rate 2011`,
rate_12 = `Incidence Rate 2012`,
rate_13 = `Incidence Rate 2013`,
rate_14 = `Incidence Rate 2014`) %>%
dplyr::mutate(sex = "male")
#rates are labeled by their year (ie. 2000 is rate_00, 2001 is rate_01...)
#Note to self: since this is how the incidence rates are labeled in the df, you will need to rename the column names in the bar chart to be 2000, 2001, 2002.....
#join them longways 79 community areas
### using bind_rows will give you a "long table" instead of a "wide table" which is what I was trying to do initially (this will be useful for capstone data as well)
incid.rates.clean <- bind_rows(incid.female, incid.male)
chicago.annual.rates <- incid.rates.clean %>%
filter(`Community Area Name` == "Chicago") %>%
select(rate_00,
rate_01,
rate_02,
rate_03,
rate_04,
rate_05,
rate_06,
rate_07,
rate_08,
rate_09,
rate_10,
rate_11,
rate_12,
rate_13,
rate_14,
sex)#the TableOne package will be used here to create a description of variables for the project.
## Vector of variables to summarize
var.t1 <- c("Sex", "Incidence", "Year")
## Vector of categorical variables that need transformation
catVars.t1 <- c("Sex", "Year")
## Create a TableOne object
#table #1 is doing, mean, SD, median, and quartiles for the incidence rates by sex#using a line plot in ggplot to show the longitudinal differences between males and females
#ggplot(data=chi.annual.rates.sex, aes(x = year, y = Incidence, color = sex)) +
# geom_point() +
# geom_line() +
# ggtitle("Chlamydia Incidence of Chicago in 2000")+
# scale_y_continuous(name="Incidence", limits=c(0, 3000))#checking for normality with histograms
summary(chi.annual.rates.sex$Incidence)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 459.6 1018.1 1561.3 1677.1 2434.1 2870.9
#not normal distribution
wilcox.test(data = chi.annual.rates.sex, `Incidence` ~ `sex`)##
## Wilcoxon rank sum exact test
##
## data: Incidence by sex
## W = 225, p-value = 1.289e-08
## alternative hypothesis: true location shift is not equal to 0
#p<0.05
#running means for each sex for the results section
mean(chi.male.incid1$Incidence)## [1] 934.2533
mean(chi.female.incid1$Incidence)## [1] 2420.013
To analyze the trends in Community Area over time, I am mapping incidence for three time points within the collection period, 2000, 2007, and 2014. For the purposes of this project, I will use only the female incidence as male chlamydia incidence is often under-counted as a result of under-testing in the male population (Knight et al., 2016).
#mapping with the shape data
#we need to rename the Community Area Name in the female chlamydia dataframe to be "pri_neigh" so we can join on that variable with the shape file for mapping
geom.chlm.cleaned <- incid.female %>%
select(`Community Area Name`,
`rate_00`,
`rate_07`,
`rate_14`) %>%
filter(!row_number() %in% c(78, 79)) %>%
rename(`pri_neigh` = `Community Area Name`)
#need to make the geoJSON into a sf file so that you can merge and preserve the shape data. If you join the dataframes prior to converting geoJSON to sf, you lose the shape data as it will become character/numeric which is not compatable with either ggplot or leaflet
geoJSON <- st_as_sf(geoJSON)
geom.chlm.cleaned <- inner_join(geoJSON, geom.chlm.cleaned, by = "pri_neigh") #I THINK THIS IS THE ONE
# Select a color palette with which to run the palette function
pal_fun <- colorNumeric("BuPu", NULL) # Blue-Purple from RColorBrewer
my_theme <- function() {
theme_minimal() +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_line(color = "white"),
legend.key.size = unit(0.8, "cm"),
legend.text = element_text(size = 16),
legend.title = element_text(size = 16),
plot.title = element_text(size = 22))
}
myPalette <- colorRampPalette(brewer.pal(9, "BuPu"))
#leaflet code (interactive map)
#first you need to set the popup messages
# base of leaflet function with arguments Pop-up message
#pop_up with year 2000 rates
pu_message00 <- paste0(geom.chlm.cleaned$pri_neigh, # paste0 to append tract name with other relevant text
"<br>Chlamydia Incidence: ", # <br> forces new line
# use round function to round continuous poverty rate to one decimal point
round(geom.chlm.cleaned$rate_00, 1))
#pop_up with year 2007 rates
pu_message07 <- paste0(geom.chlm.cleaned$pri_neigh, # paste0 to append tract name with other relevant text
"<br>Chlamydia Incidence: ", # <br> forces new line
# use round function to round continuous poverty rate to one decimal point
round(geom.chlm.cleaned$rate_07, 1))
#pop_up with year 2014 rates
pu_message14 <- paste0(geom.chlm.cleaned$pri_neigh, # paste0 to append tract name with other relevant text
"<br>Chlamydia Incidence: ", # <br> forces new line
# use round function to round continuous poverty rate to one decimal point
round(geom.chlm.cleaned$rate_14, 1))#Looking at the female incidence rates, we are going to break the dataframe into halves to evaluate the first half of the time collection period (2000-2007) and compare it to the second half (2007-2014).
var.t2 <- c("rate_00", "rate_07", "rate_14")
summ.stats.table <- CreateTableOne(data = incid.female, vars = var.t2)
summary(summ.stats.table)##
## ### Summary of continuous variables ###
##
## strata: Overall
## n miss p.miss mean sd median p25 p75 min max skew kurt
## rate_00 79 3 4 2132 1857 1163 582 3723 92 5936 0.6 -1.0
## rate_07 79 3 4 2699 2358 1386 875 4801 173 7663 0.8 -0.8
## rate_14 79 1 1 2914 2259 2016 1081 4690 416 8818 0.8 -0.4
midpoint_analysis_df <- incid.female %>%
select(rate_00, rate_07, rate_14) %>%
mutate(time00_07 = rate_07 - rate_00) %>%
mutate(time07_14 = rate_14, rate_07) %>%
filter(!row_number() %in% c(78, 79)) #this line excludes the unknown and Chicago aggregate row so the values are not skewed by the aggregate.
var.t3 <- c("time00_07", "time07_14")
change_over_time <- CreateTableOne(data = midpoint_analysis_df, vars = var.t3, testNonNormal = wilcox.test)
summary(change_over_time)##
## ### Summary of continuous variables ###
##
## strata: Overall
## n miss p.miss mean sd median p25 p75 min max skew kurt
## time00_07 77 2 3 569 956 379 61 994 -2623 3820 0.3 2.3
## time07_14 77 0 0 2919 2273 1973 1068 4737 416 8818 0.8 -0.4
wilcox.test(Pair(time00_07, time07_14) ~ 1, data = midpoint_analysis_df)##
## Wilcoxon signed rank test with continuity correction
##
## data: Pair(time00_07, time07_14)
## V = 0, p-value = 5.388e-14
## alternative hypothesis: true location shift is not equal to 0
#change over time here shows that the mean +/- SD for the first 7 years is 569 +/- 956 and for the second 7 years is 2919 +/- 2273 table1 <- CreateTableOne(vars = var.t1, data = chi.annual.rates.sex, factorVars = catVars.t1, strata = c("sex"))## Warning in ModuleReturnVarsExist(vars, data): The data frame does not have: Sex Year Dropped
## Warning in ModuleReturnVarsExist(factorVars, data): The data frame does not have: Sex Year Dropped
summary(table1)##
## ### Summary of continuous variables ###
##
## sex: female
## n miss p.miss mean sd median p25 p75 min max skew kurt
## Incidence 15 0 0 2420 314 2451 2220 2679 1869 2871 -0.2 -0.9
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## sex: male
## n miss p.miss mean sd median p25 p75 min max skew kurt
## Incidence 15 0 0 934 209 1015 768 1075 460 1254 -0.7 0.3
##
## p-values
## pNormal pNonNormal
## Incidence 4.275561e-15 3.066978e-06
##
## Standardize mean differences
## 1 vs 2
## Incidence 5.570504
table1## Stratified by sex
## female male p test
## n 15 15
## Incidence (mean (SD)) 2420.01 (313.86) 934.25 (209.21) <0.001
ggplot(data=chi.annual.rates.sex, aes(x = year, y = Incidence, color = sex)) +
geom_point() +
geom_line() +
ggtitle("Chlamydia Incidence of Chicago in 2000")+
scale_y_continuous(name="Incidence", limits=c(0, 3000))## geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
Throughout the 14 year data collection period, the line plot shows that chlamydia incidence was higher in women.
To test whether the differences depicted in part 1a are meaningful, we ran a Wilcoxon Rank Sum Test. The average incidence of chlamydia is significantly higher in females than in males in Chicago between the years of 2000-2014 (W = 225, p = 1.289e-08). The average incidence in males over the 14 year period is 934.2533 per 100,000 as compared to 2420.013 per 100,000 in females over the same period.
table3 <- CreateTableOne(vars = var.t1, data = chi.annual.rates.sex, strata = c("sex"), testNonNormal = wilcox.test)## Warning in ModuleReturnVarsExist(vars, data): The data frame does not have: Sex Year Dropped
table3 ## Stratified by sex
## female male p test
## n 15 15
## Incidence (mean (SD)) 2420.01 (313.86) 934.25 (209.21) <0.001
In this section, we assessed the incidence of chlamydia by neighborhood using the interactive leaflet mapping tool. Looking at the maps across all three time periods, it is evident that the neighborhoods with consistently high chlamydia incidence pool in the South Side and the neighborhoods with consisently low incidence are predominantly in the North Side.
#leaflet with year 2000 rates
leaflet(geom.chlm.cleaned) %>%
addPolygons(stroke = FALSE,
fillColor = ~pal_fun(rate_00),
fillOpacity = 1, smoothFactor = 0.5,
popup = pu_message00) %>%
addProviderTiles("CartoDB.Positron") %>%
addLegend("bottomleft",
pal = pal_fun,
values = ~rate_00,
title = 'Female Chlamydia Incidence (per 100,000) in 2000',
opacity = 1) %>%
addScaleBar()#leaflet with year 2007 rates
leaflet(geom.chlm.cleaned) %>%
addPolygons(stroke = FALSE,
fillColor = ~pal_fun(rate_07),
fillOpacity = 1, smoothFactor = 0.5,
popup = pu_message07) %>%
addProviderTiles("CartoDB.Positron") %>%
addLegend("bottomleft",
pal = pal_fun,
values = ~rate_00,
title = 'Female Chlamydia Incidence (per 100,000) in 2007',
opacity = 1) %>%
addScaleBar()#leaflet with year 2014 rates
leaflet(geom.chlm.cleaned) %>%
addPolygons(stroke = FALSE,
fillColor = ~pal_fun(rate_14),
fillOpacity = 1, smoothFactor = 0.5,
popup = pu_message00) %>%
addProviderTiles("CartoDB.Positron") %>%
addLegend("bottomleft",
pal = pal_fun,
values = ~rate_00,
title = 'Female Chlamydia Incidence (per 100,000) in 2014',
opacity = 1) %>%
addScaleBar()To test if there is a difference in incidence between the first 7 years of the data collection (2000-2007) vs the second 7 years of data collection (2007-2014) we ran a Wilcoxon Rank Sum Test. The average incidence of chlamydia in females across all neighborhoods is significantly higher in the second half of data collection (p = 5.388e-14). The average incidence for females from 2000-2007 was 569 +/- 956 in comparison to the average incidence from 2007-2014 which was 2919 +/- 2273.
Given the recommendation practices and high rates of asymptomatic people with chlamydia, we would expect to see significantly higher rates of chlamydia in females as compared to males. This finding is consistent with previous literature that found the STI programs focus more on female health than male (Knight et al., 2016). Although there was no hypothesis testing done evaluating the association between incidence and neighborhoods, the maps visually show that there is consistently higher incidence in the south in comparison to the north. In the city of Chicago, the South Side is home to more low-income communities and communities of color in comparison to the rest of the city (Statistical Atlas, 2015). This concept is also consistent with chlamydia and STI trends in the United States which are known to disproportionately impact people of color and low-income folks (CDC, 2021b). It bears repeating that the mapping done here is exploratory in nature and is limited in ability to make conclusions due to the limited data and geographic information available from the city of Chicago. One of the biggest limitations in the mapping is the lack of data in some neighborhoods on account of the city organizing areas by neighborhood names which are fluid and subject to change from year to year creating gaps in the maps produced. Chicago showed the same upward trend in chlamydia cases that is observed in the rest of the country. The average incidence of chlamydia in females across the city went up from 569 to 2919 before and after 2007 respectively. During this project, limitations popped-up making certain aspects unfeasible. The largest limitation is a lack of complete geospatial mapping ability. Since I was working with publicly available data that collect geographic data with neighborhood names rather than zip code, block, or census track, some mapping was not possible. Despite this limitation here, it could be a direction for future projects to investigate. In addition to mapping the rates by geography and assessing for race and income - factors we know from previous literature to be structurally impacting some groups more than others - mapping rates with an overlay of testing sites and healthcare coverage would be useful. Per the recommendation of Dr. Kate Wallis, using the chlamydia rates by location and mapping neighborhood demographics, the testing site locations, and health insurance coverage could be an excellent way to visualize why some area’s have higher incidence rates than others. Mapping testing sites and health insurance coverage may serve as a proxy to evaluate levels of access to screening and subsequent treatment.
As someone newer to both data analysis and coding, I needed a lot of help building up this project and as such would like to sincerely thank everyone who helped me on this project and in my journey this semester learning how to code in R-studio!
I am so grateful to those who helped me and cannot wait to continue to learn R-studio and put the new skill to use!
Centers for Disease Control and Prevention (2021). Reported STD’s Reach All-time High for 6th Consecutive Year. Retrieved
from: https://www.cdc.gov/media/releases/2021/p0413-stds.html
Centers for Disease Control and Prevention (2021). What STD Tests Should I Get? Retrieved from: https://www.cdc.gov/std/prevention/screeningreccs.htm
City of Chicago (2018). Boundaries - Neighborhoods. Retrieved from: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Neighborhoods/bbvz-uum9
Knight R., Falasinnu T., Oliffe J., Gilbert M., Small W., Goldenberg S., and Shoveller J. (2016). Integrating gender and sex to unpack trends in sexually transmitted infection surveillance data in British Columbia, Canada: an ethno-epidemiological study. BMJ 6(8). doi: 10.1136/bmjopen-2016-011209.
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Statistical Atlas (2015). South Chicago Demographics and Statistics. Retrieved from: https://statisticalatlas.com/neighborhood/Illinois/Chicago/South-Chicago/Household-Income
Weaver K. (2021) Public Health Statistics - Chlamydia cases among females aged 15-44 Chicago, by year, 2000-2014. City of Chicago - Health and Human Services. Retrieved from: https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Chlamydia-cases-among-fem/bz6k-73ti/data?no_mobile=true
Weaver K. (2021) Public Health Statistics - Chlamydia cases among males aged 15-44 Chicago, by year, 2000-2014. City of Chicago - Health and Human Services. Retrieved from: https://data.cityofchicago.org/Health-Human-Services/Public-Health-Statistics-Chlamydia-cases-among-mal/35yf-6dy3